influencing the outcome more, because they are fighting over explaining the variability in the
dependent variable. Although models with collinearity are valid, they are hard to interpret if you are
looking for cause-and-effect relationships, meaning you are doing causal inference. Chapter 20
provides philosophical guidance on dealing with collinearity in modeling.
Calculating How Many Participants You Need
Studies should target enrolling a large enough sample size to ensure that you get a statistically
significant result for your primary research hypothesis in the case that the effect you’re testing in
that hypothesis is large enough to be of clinical importance. So if the main hypothesis of your
study is going to be tested by a multiple regression analysis, you should theoretically do a
calculation to determine the sample size you need to support that analysis.
Unfortunately, that is not possible in practice, because the equations would be too complicated.
Instead, considerations are aimed more toward being able to gather enough data to support a planned
regression model. Imagine that you plan to gather data about a categorical variable where you believe
only 5 percent of the participants will fall in a particular level. If you are concerned about including
that level in your regression analysis, you would want to greatly increase your estimate for target
sample size. Although regression models tend to converge in software if they include at least 100
rows, that may not be true depending upon the number and distribution of the values in the predictor
variables and the outcome. It is best to use experience from similar studies to help you develop a
target sample size and analytic plan for a multiple regression analysis.